Bias and Variance Approximation in Value Function Estimates

نویسندگان

  • Shie Mannor
  • Duncan Simester
  • Peng Sun
  • John N. Tsitsiklis
چکیده

W consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regularized Autoregressive Multiple Frequency Estimation

The paper addresses a problem of tracking multiple number of frequencies using Regularized Autoregressive (RAR) approximation. The RAR procedure allows to decrease approximation bias, comparing to other AR-based frequency detection methods, while still providing competitive variance of sample estimates. We show that the RAR estimates of multiple periodicities are consistent in probabilit...

متن کامل

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

We consider the problem of estimating the value of a multiattribute resource, where the attributes are categorical or discrete in nature and the number of potential attribute vectors is very large. The problem arises in approximate dynamic programming when we need to estimate the value of a multiattribute resource from estimates based on Monte-Carlo simulation. These problems have been traditio...

متن کامل

An Application of Non-response Bias Reduction Using Propensity Score Methods

‎In many statistical studies some units do not respond to a number or all of the questions‎. ‎This situation causes a problem called non-response‎. ‎Bias and variance inflation are two important consequences of non-response in surveys‎. ‎Although increasing the sample size can prevented variance inflation‎, ‎but cannot necessary adjust for the non-response bias‎. ‎Therefore a number of methods ...

متن کامل

Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...

متن کامل

Bias-Induced Optical Absorption of Current Carrying Two-Orbital Quantum Dot with Strong Electron-Phonon Interaction (Polaron Regime)

The one photon absorption (OPA) cross section of a current carrying two-orbital quantum dot (QD) with strong electron-phonon interaction (polaron regime) is considered. Using the self-consistent non-equilibrium Hartree-Fock (HF) approximation, we determine the dependence of OPA cross section on the applied bias voltage, the strength of effective electron-electron interaction, and level spacing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Management Science

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2007